scheduling overhead
- North America > Canada (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > Middle East > Jordan (0.04)
Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
Deep learning (DL) frameworks take advantage of GPUs to improve the speed of DL inference and training. Ideally, DL frameworks should be able to fully utilize the computation power of GPUs such that the running time depends on the amount of computation assigned to GPUs. Yet, we observe that in scheduling GPU tasks, existing DL frameworks suffer from inefficiencies such as large scheduling overhead and unnecessary serial execution. To this end, we propose Nimble, a DL execution engine that runs GPU tasks in parallel with minimal scheduling overhead. Nimble introduces a novel technique called ahead-of-time (AoT) scheduling.
- North America > Canada (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > Middle East > Jordan (0.04)
Distributed Link Sparsification for Scalable Scheduling Using Graph Neural Networks (Journal Version)
Zhao, Zhongyuan, Verma, Gunjan, Swami, Ananthram, Segarra, Santiago
--In wireless networks characterized by dense connectivity, the significant signaling overhead generated by distributed link scheduling algorithms can exacerbate issues like congestion, energy consumption, and radio footprint expansion. T o mitigate these challenges, we propose a distributed link sparsification scheme employing graph neural networks (GNNs) to reduce scheduling overhead for delay-tolerant traffic while maintaining network capacity. A GNN module is trained to adjust contention thresholds for individual links based on traffic statistics and network topology, enabling links to withdraw from scheduling contention when they are unlikely to succeed. Our approach is facilitated by a novel offline constrained unsupervised learning algorithm capable of balancing two competing objectives: minimizing scheduling overhead while ensuring that total utility meets the required level. In simulated wireless multi-hop networks with up to 500 links, our link sparsification technique effectively alleviates network congestion and reduces radio footprints across four distinct distributed link scheduling protocols. Index T erms --Threshold, massive access, scalable scheduling, graph neural networks, constrained unsupervised learning. The proliferation of wireless devices and emerging machine-type communications (MTC) [2] has led to new requirements for next-generation wireless networks, including massive access in ultra-dense networks, spectrum and energy efficiencies, multi-hop connectivity, and scalability [3]-[6]. A promising solution to these challenges is self-organizing wireless multi-hop networks, which have been applied to scenarios where infrastructure is infeasible or overloaded, such as military communications, satellite communications, vehicular/drone networks, Internet of Things (IoT), and 5G/6G (device-to-device (D2D), wireless backhaul, integrated access and backhaul (IAB)) [3]-[10]. Received 27 February 2024; revised 20 January 2025, 17 June 2025, and 13 August 2025; accepted 1 September 2025. Research was sponsored by the DEVCOM ARL Army Research Office and was accomplished under Cooperative Agreement Number W911NF-19-2-0269. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. Zhongyuan Zhao and Santiago Segarra are with the Department of Electrical and Computer Engineering, Rice University, USA.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.24)
- North America > United States > Nebraska > Lancaster County > Lincoln (0.14)
- North America > United States > California (0.14)
- (12 more...)
- Telecommunications (1.00)
- Information Technology (1.00)
- Government > Military > Army (0.88)
- Government > Regional Government > North America Government > United States Government (0.66)
Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
Deep learning (DL) frameworks take advantage of GPUs to improve the speed of DL inference and training. Ideally, DL frameworks should be able to fully utilize the computation power of GPUs such that the running time depends on the amount of computation assigned to GPUs. Yet, we observe that in scheduling GPU tasks, existing DL frameworks suffer from inefficiencies such as large scheduling overhead and unnecessary serial execution. To this end, we propose Nimble, a DL execution engine that runs GPU tasks in parallel with minimal scheduling overhead. Nimble introduces a novel technique called ahead-of-time (AoT) scheduling. Evaluation on a variety of neural networks shows that compared to PyTorch, Nimble speeds up inference and training by up to 22.34 and 3.61, respectively.
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark - RISE Lab
This work was done in collaboration with Ding Ding and Sergey Ermolin from Intel. In recent years, the scale of datasets and models used in deep learning has increased dramatically. Although larger datasets and models can improve the accuracy in many AI applications, they often take much longer to train on a single machine. However, it is not very common to distribute the training to large clusters using current popular deep learning frameworks, compared to what's been long around in the Big Data area, as it's often harder to gain access to a large GPU cluster and lack of convenient facilities in popular DL frameworks for distributed training. By leveraging the cluster distribution capabilities in Apache Spark, BigDL successfully performs very large-scale distributed training and inference.